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Abstract — We propose a novel technique to assess functional 
brain connectivity in EEG/MEG signals. Our method, called 
Sparsely- Connected Sources Analysis (SCSA), can overcome the 
problem of volume conduction by modeling neural data innova- 
tively with the following ingredients: (a) the EEG is assumed to 
be a linear mixture of correlated sources following a multivariate 
autoregressive (MVAR) model, (b) the demixing is estimated 
jointly with the source MVAR parameters, (c) overfitting is 
avoided by using the Group Lasso penalty. This approach allows 
to extract the appropriate level cross-talk between the extracted 
sources and in this manner we obtain a sparse data-driven model 
of functional connectivity. We demonstrate the usefulness of 
SCSA with simulated data, and compare to a number of existing 
algorithms with excellent results. 

I. Introduction 

A. Functional brain connectivity 

The analysis of neural connectivity plays a crucial role 
for understanding the general functioning of the brain. In 
the past two decades such analysis has become possible 
thanks to tremendous progress that has been made in the 
fields of neuroimaging and mathematical modeling. Today, a 
multiplicity of imaging modalities exists, allowing to monitor 
brain dynamics at different spatial and temporal scales. 

Given multiple simultaneously-recorded time-series reflect- 
ing neural activity in different brain regions, a functional (task- 
related) connection (sometimes also called information flow or 
(causal) interaction in this paper) between two regions is com- 
monly inferred, if a significant time-lagged influence between 
the corresponding time-series is found. Different measures 
have been proposed for quantifying this influence, most of 
them being formulated either in terms of the cross-spectrum 
(e.g., coherence, phase slope index [1]) or an autoregressive 
models (e.g., Granger causality [2], directed transfer function 
[3], partial directed coherence [4], [5]). 

B. Volume conduction problem in EEG and MEG 

In electroencephalography (EEG) and magnetoencephalog- 
raphy (MEG), sensors are placed outside the head and the 
problem of volume conduction arises. That is, rather than 
measuring activity of only one brain site, each sensor captures 
a linear superposition of signals from all over the brain. This 
mixing introduces instantaneous correlations in the data, which 
can cause traditional analyses to detect spurious connectivity 
[6]. 
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C. Existing source connectivity analyses 

Only recently, methods have been brought up, which qualify 
for EEG/MEG connectivity analysis, since they account for 
volume conduction effects. These methods can roughly be 
divided as follows. 

One type of methods aims at providing meaningful connec- 
tivity estimates between sensors. The idea here is, that only the 
real part of the cross-spectrum and related quantities is affected 
by instantaneous effects. Thus, by using only the imaginary 
part, many traditional coupling measures can be made robust 
against volume-conduction [1], [6]. 

Another group of methods attempts to invert the mixing 
process in order to apply standard measures to the obtained 
source estimates. These methods can be further divided into 
(i) source-localization approaches (where sources are obtained 
as solutions to the EEG/MEG inverse problem), (ii) methods 
using statistical assumptions, and (iii) combined methods. The 
first approach is pursued, for example, in [7], [8]. Methods 
in the second category can be appealing, since they avoid 
finding an explicit inversion of the physical forward model. 
Instead, both the sources and the (de-)mixing transformation 
are estimated. To make such decomposition unique, assump- 
tions have to be formulated, the choice of which is not so 
straightforward. We will now briefly review some possibilities 
for such assumptions. 

Principal component analysis (PCA) and independent com- 
ponent analysis (ICA) are the most prominent linear decom- 
position techniques for multivariate data. Unfortunately, these 
methods contradict either with the goal of EEG/MEG connec- 
tivity analysis (assumption of independent sources in ICjAQ) or 
even with the physics underlying EEG/MEG generation (as- 
sumption of orthogonal loadings in PCA). Nevertheless, both 
concepts have been successfully used in more sophisticated 
ways to find meaningful EEG/MEG decompositions. 

For example, an interesting use of ICA is proposed in [10]. 
The authors of this paper do not assume independence of the 
source traces, but rather argue that this property holds for the 
residuals of an MVAR model if no instantaneous correlations 
in the data exist. Hence, in their MVARICA approach they 
apply ICA to the residuals of a sensor-space MVAR model. 

In this work, we first propose a single-step procedure to 
estimate all parameters (i.e. the mixing matrix and MVAR 
coefficients) of the linear mixing model of MVAR sources 
[10] based on temporal-domain convolutive ICA, instead of 
the combination of MVAR parameter fitting and demixing by 
instantaneous ICA. Furthermore, the approach enables us to 

Although, under some circumstances this approach can be justified [9]. 
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integrate a sparsity assumption on brain connectivity, i.e. on 
interactions between underlying brain sources via the Group 
Lasso penalty. The additional sparsity prior can avoid over- 
fitting in practical applications and yields more interpretable 
estimators of brain connectivity. We remark that it is hard to 
incorporate such sparsity priors in MVARICA, since MVAR 
is fit to the sensor signals where interactions (i.e. MVAR 
coefficients) are not at all sparse due to the volume conduction. 

The remainder of the paper is organized as follows. In 
Section [TTJ our procedure will be explained step by step. 
The correlated source model assumed in this paper will be 



the multivariate case, Granger causality also includes indirect 



causes not contained in non- vanishing 



(p) 



defined in II-B The identification procedure called connected 
sources analysis (CSA) based on the convolutive ICA will be 
introduced ( |II-C| ) and followed by its sparse version, sparse 
connected sources analysis (SCSA) with the Group Lasso prior 



( II-D ). The relations of our methods with existing approaches 
such as MVARICA [10] and CICAAR [11] will be elucidated 
in detail ( II-E| ). Finally, the optimization algorithms for CSA 
and SCSA will be explained (II-F). We implemented two 
versions for SCSA, one based on L-BFGS and the other by EM 
algorithm which is slower, but numerically more stable. The 
next section III will provide our experimental results on simu- 



lated data sequences emulating realistic EEG recordings. The 
plausibility of our correlated source model will be discussed 
with future research directions in the context of computational 



neuroscience (Section IV), before the concluding remarks 
(Section M). 



II. Connected Sources Analysis with Sparsity 
Prior 

A. MVAR for modeling causal interactions 

Autoregressive (AR) models are frequently used to define 
directed "Granger-causal" relations between time- series. The 
original procedure by Granger involves the comparison of two 
models for predicting a time series z^ containing either past 
values of Zi and Zj, or Z{ only [2]. If involvement of Zj leads 
to a lower prediction error, (Granger-causal) information flow 
from Zj to zi is inferred. Since this may lead to spurious 
detection of causality if both Z{ and Zj are driven by a common 
confounder z*, it is advisable to include the set {^i, ... , zm}\ 
{zi, Zj} of all other observable time series in both models. 

It has been pointed out, that pairwise analysis can be 
replaced by fitting one multivariate autoregressive (MVAR) 
model to the whole dataset, and that Granger-causal inference 
can be performed based on the estimated MVAR model 
coefficients (e.g., [5], [12]). Several connectivity measures are 
derived from the MVAR coefficents [3], [4], but probably the 
following definition is most straightforward from Granger's 
argument that the cause should always precede the effect. We 
say that time series Zi has a causal influence on time series 
Zj if the present and past of the combined time series Z{ and 
Zj can better predict the future of Zj than the present and 
past of Zj alone. In the bivariate case this is equivalent to 
saying that for at least one p G {1, . . . ,P}, the coefficient 
corresponding to the interaction between Zj and Z{ at the 
pth time-lag is nonzero (significantly different from zero). In 



B. Correlated sources model 

In this paper we propose a method for demixing the 
EEG/MEG signal into causally interacting sources. We start 
from the same model as in [10]: the sensor measurement is 
assumed to be generated as a linear instantaneous mixture of 
sources, which follow an MVAR model 

x(t) = Ms(t) (1) 
p 

s(t) = 5^£TMs(t-p)+e(f). (2) 
P =i 

Here, x(t) is the EEG/MEG signal at time t, M is a mixing 
matrix representing the volume conduction effect, s(t) is the 
demixed (source) signal. The sources at time t are modeled as 
a linear combination of their P past values plus an innovation 
term s(t), according to an MVAR model with coefficient 
matrices H^ p \ In the standard MVAR analysis, the innova- 
tion e(t) is a temporally- and spatially-uncorrelated Gaussian 
sequence. In contrast, we assume here that it is Ltd. in time 
and the components are subject to non-Gaussian distributions 
in order to apply blind source separation (BSS) techniques 
based on higher-order statistics [10], [11]. 

For simplicity, we deal with the case that the numbers of 
sensors and sources are equal and the mixing matrix M is 
invertible. When there exist less sources than sensors, the 
problem falls into the current setting after being preprocessed 
by PC A [10]. Under our model assumptions, the innovation 
sequence can be obtained by a finite impulse response (FIR) 
filtering of the observation, i.e. 

p 

e(i) = M _1 x(t) - H^M'^it - p) (3) 
P =i 

p 

= ^wMx(t-p), (4) 

where the filter coefficients are determined by the mixing 
matrix M and the MVAR parameters {H^} as 



M' 1 



p = 
p>0 



(5) 



Thanks to the non-Gaussianity assumption on the innovation 
e(t), we can use BSS techniques based on higher-order 
statistics to identify the inverse filter Since we 

would like to impose sparse connectivity as a plausible prior 
information later on, it is preferable to apply temporal-domain 
convolutive ICA algorithms. The obtained FIR coefficients 
{W^} directly identify the mixing matrix M and the MVAR 
model of the same order P. 

C Identification by convolutive ICA 

We use temporal-domain convolutive ICA for inferring 
volume conduction effects and causal interactions between 
extracted brain signals. The model parameters can be identified 
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based on the mild assumptions that the innovations are non- 
Gaussian and (spatially and temporally) independent. For EEG 
and MEG data, a super-Gaussian is prefered to a sub-Gaussian 
distribution, assuming that ongoing activity of brain networks 
is triggered by spontaneous local bursts. We here adopt the 
super-Gaussian sech-distribution that was proposed in [11]. 
The Likelihood of the data under the model is then 



P (Mt)}J =P+1 \{w^}) 



= \W^\ T ~ P fl n-sech(s d (i)) , 



(6) 



t=P+l d=l 

^p 



where e(t) = M^x^) - J2p =1 H^M' 1 ^ - p). The cost 
function to be minimized is the negative log-Likelihood 



C({WW}) = (P-T)log\WW\ 



T D 

E E lo g(^ sech (^w)) • 
=p+id=i ^ ' 



The solution of Eq. (([7])) leads to the estimators of the mixing 
matrix M and the MVAR coefficients {H^} via Eq. (g)). We 
will call this procedure Connected Sources Analysis (CSA). 

We remark that the temporal-domain algorithm of convolu- 
tive ICA has obvious indeterminacy due to permutations and 
sign flips. However, once we fix a rule to chose one from all 
candidates, the cost function can be considered as convex. 

D. Sparse connectivity as regularization 

In practice, we usually have to consider a long-range lag P 
to explain temporal structures of data sequences. However, 
this causes too many parameters to be estimated reliably. 
Maximum-Likelihood estimation may easily lead to overfit- 
ting, especially if T is small. For this reason, it is advisable to 
adopt a regularization scheme. Several authors have suggested 
that the complexity of MVAR models can be reduced by 
shrinking MVAR coefficients towards zero. In [12] and [13], 
MVAR-based functional brain connectivity is estimated from 
functional magnetic resonance imaging (fMRI) recordings us- 
ing an £i-norm based (Lasso) penalty, which has the property 
of shrinking some coefficients exactly to zero. In [5] it is 
pointed out, that, by using a so-called Group Lasso penalty, 

whole connections between time-series can be pruned at once. 

(v) 

In this approach, all coefficients B.\-' , p — 1 , . . . , P modeling 
the information flow from s$ to Sj are grouped together and 
can only be pruned jointly. From the practical standpoint such 
sparsification is very appealing, since fewer connections are 
much easier to interpret. But assuming sparse connectivity in 
fMRI data might also be justified from a neurophysiological 
point of view, since under appropriate experimental conditions 
only a few macroscopic brain areas are expected to show 
significant interaction. This reasoning also applies to EEG and 
MEG data. 

We note that, besides the penalty-based approach, other 
strategies for obtaining sparse connectivity graphs exist. For 
example, post-hoc sparsification can be achieved for dense 
estimators by means of statistical testing [5], [14]. However, 
due to the compelling built-in regularization, we here adopt 
Group Lasso sparsification. 



Before applying our regularization to the cost function of 
the correlated sources model, it is important to note that 
the sparsity assumption is only reasonable for the MVAR 
coefficients {H^}, but not for the matrices which 

combine MVAR coefficients and the instantaneous demixing. 
Hence, in order to apply sparsifying regularization, one has to 
split the parameters into demixing and MVAR parts again, as 
in the original model Eq. (([l}). Since the off diagonal elements 
{H^} correspond to interaction between sources, we propose 
to put a Group Lasso penalty on them analogously to [5]. I.e., 
we penalize the sum of the £2 -norms of each of the groups 

Let B := M" 1 (= W^), s(t) = Bx(t) and s(t) = 
J2p=i H^s(t — p). The regularized cost function is 



£ SCSA (B,{W P >}) 

= (P — T) log \B\ + A 



H 



(i) 

df ' 



1 



(8) 



T D 

- 5Z log ( - sech ( s d0) - §d(t)) 
t=p+id=i ^ 7r 

A being a positive constant. The solution to Eq. (([5])) for a 
choice of A is called the Sparsely-Connected Sources Analysis 
(SCSA) estimate. 

E. Relation to other methods 

The proposed method extends previously suggested MVAR- 
based sparse causal discovery approaches [5], [12] by a linear 
demixing, which is appropriate for EEG/MEG connectiv- 
ity analysis. Although the correlated sources model Eq. ([T]) 
leads to an MVAR model of the observation sequence [10], 
sparsity of the coefficients cannot be expected after mixing 
by volume conduction effects. Our method compares with 
MVARICA [10], which uses the same model Eq. {j}, but 
estimates its parameters differently. More precisely, the authors 
of MVARICA suggest to first fit an MVAR model in sensor- 
space. The demixing can then be obtained by performing 
instantaneous ICA on the MVAR innovations, i.e., a dedicated 
contrast function (Infomax) is used to model independence 
of the innovations. The obtained sources follow an MVAR 
model with time-lagged effects (interactions), but ideally no 
instantaneous correlations (as caused by volume conduction). 

It also turns out that the model Eq. ([]]) is very similar to 
the convolutive ICA (cICA) [11], [15]-[17] model. The only 
difference is that Eq. ([T]) employs a FIR filter to extract the 
innovations, while an infinite response filter (IIR) is usually 
used in the cICA literature (see, e.g., [11]). This discrepancy 
is explained by the different philosophies that are associated 
with both methods. While in our approach the innovations e(t) 
arise as residuals of a finite-length source-MVAR model, cICA 
understands them as sources of a finite-length convolutional 
mixture. Nevertheless, our unregularized cost function can 
be regarded as a maximum-Likelihood approach to an IIR 
version of convolutive ICA. This leads us also to a new view 
of convolutive ICA as performing an instantaneous demixing 
into correlated sources. Hence, it is possible to conduct source 
connectivity analysis using cICA (see Fig. [T] for illustration). 
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Compared to MVARICA and time-domain implementations 
of convolutive ICA such as CICAAR [11], our formulation has 
the advantage that sparse connectivity can easily be modeled 
by an additional penalty. This is not possible for CICAAR, 
because CICAAR only indirectly estimates the MVAR co- 
efficients through their inverse filters. However, these are 
generally nonsparse, even if the true connectivity structure 
is sparse. Inverting the inverse coefficients is also generally 
not possible (recall, that convolutive ICA is equivalent to 
an infinite-length source-MVAR model). It is furthermore not 
possible to introduce a sparse regularization for MVARICA, 
since this method carries out the MVAR-estimation step in 
sensor-space, where no sparsity can be assumed. 

By variation of the regularization parameter, our method 
is able to interpolate between a fully-correlated source model 
(comparable to convolutive ICA) and a model which allows no 
cross-talk between sources. Interestingly, the latter extreme can 
be seen as a variant of traditional instantaneous ICA, in which 
independence is measured in terms of mutual predictability 
with a Granger-type criterion. 



We plug the gradient into a limited memory Broyden- 
Fletcher-Goldfarb-Shanno (L-BFGS) optimizer [18 J] and ob- 
serve that the algorithm always converges to the global op- 
timum, while only the signs and order of the components 
may depend on the initialization. We use = I and 

=0,p= 1,...,P as a default initializer. 

2) SCSA via a modified L-BFGS algorithm: Using sparse 
regularization, two additional difficulties emerge compared 
to the unregularized cost function. First, using the factor- 
ization Eq. ([5]) the guaranteed convergence to the minimum 
observed for CSA is unlikely to be retained. Furthermore, 
the function Eq. ^ is not differentiable, when one of the 
terms \\(H^\ . . . , i7^) T || 2 , d ^ f becomes zero, which is 
expected to be the case at the optimum. 

For tackling these difficulties we here propose to use a 
modified version of the L-BFGS algorithm, which allows joint 
nonlinear optimization of B and {H^}, while taking special 
care of the nondifferentiability of the regularizer. The gradient 
of Eq. ^ is obtained as 



{W®} (FIR) 
8 4 - ► X 



(FIR) 
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(ARfit) 
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Fig. 1. Relations between (a) SCSA, (b) MVARICA and (c) CICAAR. 
All approaches assume a non-Gaussian innovation sequence e. SCSA and 
MVARICA fit an IIR model to the observed sequence x, while CICAAR 
assumes an FIR filter for it. Therefore, in SCSA and MVARICA the inverse 
filter from x to the innovation e is an FIR. MVARICA is a two step approach 
with AR fitting to the observed sequence x and spartial demixing of the 
innovation Me obtained in the first step. On the other hand, SCSA is a one- 
step approach which compute the inverse FIR filter by convolutive ICA. We 
remark that the AR fitting in MVARICA relies only on the second order 
statistics, which may cause the performance drops compared to CSA. 



F Optimization 

1) CSA: The gradient of the unregularized cost function 
Eq. ^ is obtained as 



dC 



dW, 



(p) 



5{p)({P-T)Wto T e d 



+ tanh ( -p) ) x ^ -p) > (9) 

t=P+l \p=0 / 

where := W^ T e d , i.e. the <i-th column vector of 



on df 



^2 tanh(s d (£) -s d (t)) s f (t - p) 



t=p+i 
X 



( H {i) H (p)\ 



(10) 



and 



dC s ' 



dB d 



, (P- T )B- T e d 

T D I 

+ J2 J2 <tanh(s d (i) -s d (t j) 

t=P+l d=l I 

x (x(t)-px d (t-p)H^ .(11) 

Our modified L-BFGS algorithm checks before 
each gradient evaluation, whether some terms 
. . . , Hjp) T \\ 2 , d 7^ / are already (close to) zero. 
If any of the terms equals zero, the gradient is not defined 
uniquely but as a set (subdifferential). Nevertheless it is 
straightforward to compute the element of the subdifferential 
with minimum norm, whose sign inversion is always a 
descent direction. Care must be taken because in practice we 
would not find any of the above terms exactly equal to zero. 
Thus we truncate the elements of H corresponding to the 
terms with small norms below some threshold to zero before 
computing the minimum norm subgradient. If the minimum 
is indeed attained at the truncated point, the minimum norm 
subgradient will be zero. Otherwise the subgradient will 
drive the solution out of zero. Further care must be taken in 
practice to prevent the solution from oscillating in and out of 
some zero. 



z We 



implementation by Naoaki Okazaki, 



http : //www . chokkan . org/ software /liblbf gs /. 
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We find that, using the outlined optimization procedure, 
sparse solutions can be found in shorter time, if the solution of 
the unregularized cost function is used as the initializer. The 
starting point can be obtained using the inverse transformation 
of Eq. ([5]), which is given by 



B 

H (p) 



-W ip) B~\ 



p>0 



(12) 
(13) 



3) SCSA via an EM algorithm: Using joint optimization of 
B and {H^}, the heuristic pruning of connections might in 
some cases lead to suboptimal solutions regarding the com- 
posite cost function. For this reason, we present an alternative 
optimization scheme, which does not require any heuristic 
step. The idea here is to alternate between the estimation of 
both unknowns. Doing so can be justified as an application of 
the Expectation Maximization (EM) algorithm (see [19]). 

Estimation of B given {H^} (here called E-step) amounts 
to solving an unconstrained nonlinear optimization problem. 
Importantly, this problem is also convex, in contrast to the joint 
approach to SCSA parameter fitting. The convexity follows 
from the concavity of log \X\ and log(sech(ax)) for constant a 
(and from the fact that the sum of convex functions is convex.). 
The great advantage of convex problems is, that they feature a 
unique (local and global) minimum. In our case, the objective 
is smooth, so the minimum is guaranteed to be found by the 
L-BFGS algorithm, making use of the gradient in Eq. (pj}. 

Optimization with respect to {H^} for fixed B (M-step) 
is more involved, since the nondifferentiable Group Lasso 
regularizer remains. Smooth optimization methods like L- 
BFGS are unlikely to find the exact solution here. However, 
this problem is not as difficult as the joint optimization 
problem, since it is convex. This can be seen from the fact 
that it is composed of a sum of — log(sech(ax)) terms (loss 
function) and the Group Lasso term (regularizer), which is 
a sum of £2 -norms and thus convex. Hence we can solve 
this problem using the Dual Augmented Lagrangian (DAL) 
procedure [20], which has recently been introduced as a 
method for minimizing arbitrary convex loss functions with 
additional Lasso or Group Lasso penalties. Application of 
DAL requires the loss function and its gradient, the convex 
conjugate (Legendre transform) of the loss function, as well as 
gradient and Hessian of the conjugate loss. Let s(t) = B^(t) 
be the demixed sources and s(t) = Ep=iH {p) s(t-p) be 
their autoregressive approximations. The loss function in terms 
of s is defined as 



conjugate loss function is defined on the interval [—1,1] and 
evaluates to 




&d(t)s d (t) - log 



sech (s d (t) - s d (t)) \ 

7T J 



log — 



a d (t) 



-a d m 1 

log — 



• a d (t) 



& d (t)s d (t) + log- 



The gradient of the conjugate loss is given by 



<9£> M (a) 



1, 1 

- log — 
2 8 1 



d*d(t) 2 & l-a d (t) 

The Hessian is diagonal with elements 
<9 2 £> M (a) 1 



s d (t) • 



da d (t) 2 2(1 -a*(*)) 



(16) 



(17) 



(18) 



Having defined the E- and M-steps, we have turned a 
nonconvex estimation problem into a sequence of two convex 
problems, which can both be solved exactly. A final estimate 
of the model parameters can now be obtained by alternating 
between E- and M-steps until convergence. 

G. Treating source autocorrelations 

Diagonal parts of the MVAR matrices {H^} model the 
sources' autocorrelation and should preferably not be pruned. 
However, in some cases numerical stability can be increased 
if these variables are also penalized, especially if D and P 
are large. For this reason, we use a slight variation of the cost 
function Eq. d8l) in practice, which includes 



-n-H , • • • , -n-n , • • • , J^-DD-) ' ' 



' DD ) 



(19) 



as an additional penalty term. The augmented objective func- 
tion can be minimized using the techniques presented in 
Section HLFl 



III. Performance under realistic conditions 

We conducted the following simulations in order to assess 
the performance of the proposed source connectivity analysis 
compared to those of existing approaches. 



£ M (s) 



The gradient is 



D 

E lo S ( -sech(s d (t) - s d (t)) ) . (14) 



=P+1 d=l 



ds d (t) 



tanh(s d (t) - s d (t)) 



(15) 



Let a d (t) {d = 1,...,D, t = P + 1, . . . , T) denote the 
dual variables associated with the Legendre transform. The 



A. Data generation 

We simulated seven time- series (pseudo- sources) of length 
N = 2000 according to an MVAR model of order P = 4. 
Seven out of the forty-two possible interactions were modeled 
by allowing the corresponding offdiagonal MVAR coefficients 
, d 7^ /, 1 < p < P to be nonzero. The innovations were 
drawn from the sech-distribution (Note that the assumption of 
non-Gaussianity is crucial for recovering mixed sources.). 

The pseudo-sources were mapped to 118 EEG channels 
using the theoretical spread of seven randomly placed dipoles. 
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The spread was computed using a realistic forward model 
[21] which was built based on anatomical MR images of the 
"Montreal head" [22]. See Fig. [2] for an example illustrating 
the data generation. 

In reality, measurements are never noise-free and the fol- 
lowing model holds rather than Eq. ([T]) 



TABLE I 

The six types of noise used in the simulations. Noise with 
temporal correlation structure was created using univariate 
ar models of order 20. spatial correlation was introduced 
using the forward model. we distinguish between the case, 
where noise sources coincide with the true dipoles ( a ) and the 
case in which noise from all brain sites contributes to the 
measurements ( b ) 



x(t) 



Ms(t) + £(t) . 



(20) 



Since none of the methods compared here (see below) 
explicitly models a noise term, it is important to evaluate their 
robustness to model violation. To this end, we constructed 
additional variants of the pseudo-EEG dataset by adding six 
different types of noise £. The six variants (N1-N6) are 
summarized in TABLE |I| These variants differ in their degree 
of spatial and temporal correlation as follows. In variants 
Nl and N4, = 1, . . . , M were drawn independently 

for each sensor, i.e., have no spatial correlation. For variants 
N2 and N5 noise terms £*(t),i = 1,...,M were drawn 
independently for each source. In this case, sources and noise 
contributions to the EEG share the same covariance given by 
the mixing matrix M, i.e., x(t) = M((s(t) + £*(£)). For the 
last variants N3 and N6, spatially independent noise sources 
were simulated at all nodes of a grid covering the whole 
brain, yielding the model x(t) = Ms(t) + M*£*(£). Here, 
in contrast to the previous model, noise contributions are not 
collinear to the sources. We further distinguish between noise 
sources with and without temporal structure. In variants (Nl- 
N3), noise terms were drawn i.i.d. from a normal distribution 
at each time instant t. In variants N4-N6, the temporal structure 
was determined by a univariate AR model of order 20, i.e., 

Note that, since no time-delayed dependencies between 
noise sources were modeled, no additional Granger-causal 
effects were introduced by the noise. We used a signal-to-noise 
ratio (SNR) of 2 in all experiments, where SNR is defined as 



SNR 



|lM(s(l),...,s(T))||^ 

iKr(i),...,r(T))ii^ 



(21) 



and || • || jr is the Frobenius norm of a matrix. 

Finally, PCA was applied to the pseudo-EEG to reduce the 
dimensionality to D = 7 (the original number of sources) by 
taking just the seven strongest PCA components. One-hundred 
datasets with different realisations of MVAR coefficients, 
innovations and noise were constructed for each category. 

B. Methods 

We tested the ability of ICA, MVARICA, CICAAR and 
the two proposed methods CSA and SCSA to reconstruct 
the seven sources and their connectivity structure. Although 
the goal of instantaneous ICA is fundamentally different to 
source connectivity analysis, it was also included here in the 
comparison. This is since, even if independence of the sources 
is not fulfilled, ICA might still provide as-least-as-possible 
dependent components, the connectivity of which might be 
analyzed. The ICA variant used here is based on temporal 
decorrelation [23]-[26] (implemented by fast approximate 





independent in time 


correlated in time 


independent in sensors 


Nl 


N4 


correlated in sensors a 


N2 


N5 


correlated in sensors b 


N3 


N6 



joint diagonalization [27]). The number of temporal lags was 
set to 100. 

MVARICA, CICAAR, CSA and SCSA were tested with 
P G {1,2,. ..,7} temporal lags, where four is the true MVAR 
model order for CSA, SCSA and MVARICA. CICAAR has 
the disadvantage here, that it may generally require extended 
temporal filters for reconstructing sources following model 
Eq. ([]]). However, due to computation time constraints, P = 7 
was taken as the maximum lag also for this method. For 
MVARICA and CICAAR, we used implementations provided 
by the respective authors. These implementations adopt the 
Bayesian Information Criterion (BIC) for selecting the appro- 
priate number of time lags. The same criterion was used to 
select the model order in CSA and SCSA. The regularization 
constant A of SCSA was set by 5-fold cross-validation. SCSA 
estimates of and B were obtained either jointly 

using the modified L-BFGS algorithm or alternately using 
20 additional EM steps. These variants are named SCSA and 
SCS_EM here, respectively. 



C. Performance measures 

The most important performance criterion is the reconstruc- 
tion of the mixing matrix, since all other relevant quantities can 
basically be derived from it. All considered methods provide 
an estimate M _1 of the demixing, which can be inverted to 
yield an estimated mixing matrix. The columns of the mixing 
matrix correspond to spatial field patterns of the estimated 
sources, but unfortunately these patterns can generally only 
be determined up to sign, scale and order. For this reason, 
optimal pairing of true and estimated patterns as described in 
[28] was performed. The similarity measure between patterns 
was slightly modified compared to the one used in [28]. We 
used the goodness-of-fit achieved by a linear least-squares 
regression of one to another pattern. For a true pattern and 
an estimated pattern Mf the optimal regression coefficient is 



c(M d ,M f ) 
and the goodness-of-fit (GOF) is 

\\cM f 



MjM d 



GOF (M d , Mf) = 



M, 



\M, 



(22) 



(23) 
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Having found the optimal pairing, the colums of M were 
permuted and scaled to approximate M as good as possible 
using the optimal regression coefficients. The goodness-of-fit 
with respect to the whole matrix M was used to evaluate the 
quality of the different decompositions. Additionally, using 
the optimally-matched mixing patterns, dipole scans were 
conducted and the deviation of the obtained dipole locations 
from the true ones was measured. A typical example of a 
mixing pattern estimated by SCSA and the corresponding 
reconstructed dipole is shown in Fig. [2] 

Finally, causal discovery according to [5] was carried out 
on the demixed sources. The exact technique used was MVAR 
estimation with Ridge Regression. For the MVAR parameters 
estimated by Ridge Regression an approximate multivariate 
Gaussian distribution can be derived, which was used to test 
the coefficients for beeing significantly different from zero. An 
influence from Si to s« was defined, if the p- value of one of the 

(v) 

coefficients H^ J ,p = 1, . . . , P fell below the critical value. 
As a third performance criterion, the area under curve (AUC) 
score for correctly discovering the interaction structure was 
calculated by varying the significance threshold and comparing 
estimated and true connectivity matrix for each threshold. 
Note that this way of connectivity estimation was pursued 
here in order to treat all methods equally. This was necessary, 
since not all methods provide built-in connectivity estimates. 
For SCSA, however, interaction analysis could as well have 
been done by directly examining MVAR coefficients. Note 
further, that using Ridge Regression based testing, the non- 
Gaussianity of the source MVAR innovations is only indirectly 
used through the use of the demixing matrix, but not for actual 
MVAR estimation. For this reason, the MVAR coefficients 
directly estimated by SCSA may be preferred to a subsequent 
Ridge Regression step when using SCSA in practice. 

D. Results 

Fig. [3] shows, how well the mixing matrix was approximated 
by the different approaches. One boxplot is drawn for the 
noiseless case (NO) and each of the six noisy variants (Nl- 
N6, see Table [I]). The plots show the median performance over 
100 repetitions, as well as the lower and upper quartiles and 
the extremal values. Outliers (red crosses) were removed. As a 
result of the simulations, SCSA typically achieves the smallest 
reconstruction error, followed by CSA, CICAAR, MVARICA 
and ICA. In many cases, these differences are also significant, 
as indicated by notches in the boxes. 

Correct (de-)mixing matrix estimation affects both the lo- 
calization error achievable by applying inverse methods to the 
estimated patterns and the error of any connectivity analysis 
performed at the demixed sources. As a result of good mixing 
matrix approximation, SCSA also achieves smaller dipole 
localization errors than all other methods, except in one 
scenario (shown in Fig. [4]). The same situation occurs when it 
comes to estimating the connectivity between sources (Fig. [5}. 

Interestingly, the higher numerical stability we observed 
for the EM variant of SCSA compared to joint parameter 
estimation only sometimes leads to superior performance. This 
may be related to our observation, that the difference between 




(a) Simulated Dipole (b) Corresponding EEG Pattern 




(c) Estimated Dipole (d) Estimated EEG Pattern 



Fig. 2. Example of simulated data (noise type Nl) and corresponding 
reconstruction by SCSA. (a) Simulated dipole. (b) Field pattern describing 
the dipole' s influence on the EEG (one column of M). (d) Field pattern as 
estimated by SCSA from noisy EEG time series, (c) Reconstructed dipole, 
obtained from the estimated pattern. 

the implementations becomes large only for excessively large 
amounts of regularization, which are not optimal in terms of 
the cross-validation criterion. Another reason might be that the 
instability of the MVAR coefficients around zero does not play 
a crucial role in our current evaluation, since all performance 
measures used here were solely derived from the demixing 
matrix. 

Regarding noise influence it might be said that the relative 
degradation of performance in the presence of noise is the 
same for all methods. Generally, noise that is collinear to 
the sources (N2/N5) seems to be less problematic than noise 
that is uncorrelated across sensors (N1/N4) and noise with 
arbitrary spatial correlation structure (N3/N6). Judging from 
mixing matrix approximation and dipole localization errors, 
the temporal structure of the noise seems not to affect the 
performance much. However, small errors in the (de-)mixing 
matrix can have quite a negative effect on the connectivity 
estimation, as can be seen in the right part of Fig. [5] 

The time each method consumed on average for processing 
one dataset is shown in Fig. [6] Most methods finish in 
rather short time, while the EM implementation of SCSA 
is in medium range and CICAAR requires the longest time. 
However, for SCSA there is still room for improvement, 
since the regularization parameter of this method is currently 
selected by the cross-validation procedure, which could be 
changed. 

IV. Discussion 

Let us recall the assumptions we make to identify individual 
brain sources and to estimate their interactions. While ICA 
results in a unique decomposition assuming statistical inde- 
pendence, such an assumption is inconsistent when studying 



8 



Mixing Matrix Approximation Error 




SCSA_EM 

SCSA 
co CSA 
z CICAAR 

MVARICA 

ICA 




SCSA_EM 

SCSA 
co CSA 
2 CICAAR 

MVARICA 



Connectivity Error 




.2 0.3 0.4 



SCSA_EM ttt-H- 
SCSA -«+ 

o csa m- +h- 

- CICAAR 3-HWh 
MVARICA CO- + 

ICA CC=1- - 7 



■ ++-H-+ + 
■++++ + -H- 



0.2 0.3 0.4 



SCSA_EM i- - 
SCSA 
to CSA 

2 CICAAR i 1 

MVARICA - - 
ICA 




Fig. 3. Estimation errors of the mixing matrix according to the goodness-of-fit 
(GOF) criterion. Results are shown for the proposed (Sparsely-) Connected 
Sources Analysis variants (SCSA_EM, SCSA, CSA) and three alternative 
approaches (CICAAR, MVARICA, ICA). Different subfigures depict the 
methods' performance in the noiseless cass (NO), as well as in the presence 
of different types of noise (N1-N6, see TABLE [I}. 



Fig. 5. Estimation errors regarding the source connectivity structure as 
measured by fitting an MVAR model subsequently to the demixed sources and 
testing the obtained coefficients for significant interaction. The performance 
measure reported is the area under the curve (AUC) score obtained by varying 
the significance level. 
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Fig. 4. Localization errors of dipole fits conducted on the estimated mixing 
field patterns. Results are shown for the proposed (Sparsely-) Connected 
Sources Analysis (SCSA_EM, SCSA, CSA) variants and three alternative 
approaches (CICAAR, MVARICA, ICA). Different subfigures depict the 
methods' performance in the noiseless cass (NO), as well as in the presence 
of different types of noise (N1-N6, see TABLE [IJ. 



brain interactions. However, all neural interactions require a 
minimum delay well within the temporal resolution of electro- 
physical measurements of brain activity. Hence, it makes sense 
to assume independent innovation processes and to model all 
interactions explicitly using AR matrices. In relation to ICA 
we pay some price for that: In our case, independence is 
exploited effectively on reduced information contained in the 
residuals of the model. In principle, this can be a cause for 
less stable estimates. To increase stability, we have included 
sparsity assumptions based on the idea that only a few brain 
connections can be as strong to be observable in EEG data 
which is especially the case in the presence of artifacts and 
background noise. 

We emphasize that we assume a linear dynamical model 
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Fig. 6. Average runtime of the proposed (Sparsely-) Connected Sources 
Analysis variants (SCSA_EM, SCSA, CSA) and three alternative approaches 
(CICAAR, MVARICA, ICA), taken over all experiments conducted for this 
study. 



and non-Gaussian innovation processes, i.e. the only cause of 
non-Gaussianity is the innovation process itself. Real brain 
networks are, of course, more complicated. However, the 
question whether nonlinear dynamical models may improve 
the results or are even essential for a correct decomposition 
is beyond the scope of this paper and will be addressed in 
the future. Similarly, we assumed the total number of sources 
to be less or equal the number of channels. Apparently, the 
significance of this problem decreases when using a large 
number of channels. 

V. Conclusion 

Analysing the functional brain connectivity is a hard prob- 
lem, since volume conduction effects in EEG/MEG measure- 
ments can give rise to spurious conductivity. In this work we 
have established a novel connectivity analysis method SCSA 
that overcome these problems in an elegant and numerically 
appealing manner using Group Lasso. In detail, EEG is mod- 
eled as a linear mixture of correlated sources, then we estimate 
jointly the demixing process and the MVAR model (which 
is the model basis for the correlated sources). For this we 
assume that the innovations driving the source MVAR process 
are super-Gaussian distributed and (spatially and temporally) 
independent. To avoid overfitting we regularize the model 
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using the Group Lasso penalty. In this manner we can achieve 
a data driven interpolation between two extremes: a source 
model that has full correlations, i.e. convolutive ICA and 
conventional ICA that does not allow for cross-talk between 
the extracted sources. In between, our method extracts a sparse 
connectivity model. We demonstrate the usefulness of SCSA 
with simulated data, and compare to a number of existing 
algorithms with excellent results. 

Future work will study the link between methods for com- 
pensating non-stationarity in data such as Stationary Subspace 
Analysis (SSA, [29]) and our novel connectivity assessment. 
In addition, we aim to localize the extracted components 
of connectivity using distributed source models to enhance 
physiological interpretability (e.g. [30], [31]). 
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